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Abstract 

An indexed datatype is a type that uses a parameter as a type-level 
tag; a typical example is the type of vectors, which are indexed over 
a type-level natural number encoding their length. Since the intro- 
duction of generalised algebraic datatypes, indexed datatypes have 
become commonplace in Haskell. Values of indexed datatypes are 
often more involved than values of plain datatypes, and program- 
mers would benefit from having generic programs on indexed da- 
tatypes. However, no generic programming library adequately sup- 
ports them, leaving programmers with the tedious task of writing 
repetitive code. 

We show how to encode indexed datatypes in a generic pro- 
gramming library with type families and type-level representations 
in Haskell. Our approach can also be used in similar libraries, and 
is fully backwards-compatible. We show not only how to encode 
indexed datatypes generically, but also how to instantiate generic 
functions on indexed datatypes. Furthermore, all generic represen- 
tations and instances are generated automatically, making life eas- 
ier for users. 

Categories and Subject Descriptors D.l.l [Programming Tech- 
niques]: Functional Programming 

General Terms Languages 

1. Introduction 

Runtime errors are undesirable and annoying. Fortunately, the 
strong type system of Haskell eliminates many common program- 
mer mistakes that lead to runtime errors, like unguarded casts. 
However, even in standard Haskell, runtime errors still occur often. 
A typical example is the error of calling head on an empty list. 

Indexed datatypes, popular since the introduction of General- 
ized Algebraic Datatypes (GADTs, |Peyton Jones et aLp 006), allow 
us to avoid calling head on an empty list and many other runtime er- 
rors by encoding further information at the type level. For instance, 
one can define a type of lists with a known length, and then define 
head in such a way that it only accepts lists of length greater than 
zero. This prevents the usual mistake by guaranteeing statically that 
head is never called on an empty list. 

Datatype-generic programming (Gibbons 2007) is a program- 
ming technique that increases abstraction and reduces code dupli- 



cation. Given the regular structure of inductive algebraic datatypes 
in Haskell, there is a small set of primitive operations upon which 
all datatypes are built. Datatype-generic programming exploits the 
isomorphism between a datatype and its representation, using only 
a small set of primitive types to provide functions that operate simi- 
larly on any datatype. Many functions depend only on the structure 
of the datatype, and can therefore be defined generically. Typical 
examples are equality, enumeration, conversion to and from strings 
or binary encodings, traversals, etc. 

Unfortunately, datatype-generic programming and indexed da- 
tatypes do not mix well. The added complexity of the indices and 
associated type-level computations needs to be encoded in a generic 
fashion, and while this is standard in dependently-typed approaches 
to generic programming, we know of no Haskell approach dealing 
with indexed datatypes. In fact, even the standard deriving mech- 
anism, which automatically generates instances for certain type 
classes, fails to work for GADTs, in general. 

We argue that it is time to allow these two concepts to mix. 
Driven by an application that makes heavy use of both generic pro- 
gramming and indexed datatypes (Magalhaes and De Haas |20lT) , 
we have developed an extension to a current generic programming 
library to support indexed datatypes. Our extension is conservative, 
in that it preserves all library functionality without requiring mod- 
ifications to client code, and general, as it applies equally well to 
other libraries. Furthermore, we show that instantiating functions 
to indexed datatypes is not trivial, even in the non-generic case. 
In the context of datatype-generic programming, however, it is es- 
sential to be able to easily instantiate functions; otherwise, we lose 
the simplicity and reduced code duplication we seek. Therefore we 
show a way of automatically instantiating generic functions to in- 
dexed datatypes, which works for most types of generic functions. 

The rest of this paper is organized as follows: we first intro- 
duce generic programming briefly in |Section~2| and define indexed 
datatypes in |Section 3] |Section 4| de als with representing indexed 
datatypes generically, and |Section 5| focuses on the problem of in- 
stantiation. |SectIon^6] presents a general algorithm for automating 
the procedures described in the preceding two sections, and |Sec-| 
|tion 7| deals with lifting a limitation of our encoding. Finally, we 
show related work in |Section 8| present directions for future re- 
search in |Section~9l and conclude in |Section 10| 



2. Generic programming with type families 

In this paper we use a lightweight generic programming library 
using type-level representations with type families in a style similar 
to that first described by Chakravarty et al. (20091 and used by Van 
Noort et al. ( |2010[ > which we call instant-generics (the same 

[Copyright notice will appear here once 'preprint' option is removed.] name as its Hackage package). 
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2.1 Generic representation 

The basic idea in datatype-generic programming is to devise a small 
number of primitive types that can be used to define a large number 
of derived types. If we can represent many complicated types using 
a small number of primitive types, we can also define functions that 
operate on the primitive types, and make these functions work on all 
types by converting to and from the primitive types appropriately. 
We call these primitive types the representation types, as they are 
used to represent all other types. 

The instant -generics library uses the following representa- 
tion types: 



infixr 5 + 




infixr 6 x 




data a + /3 


= La\Rj5 


data a x /3 


= a x /3 


data Cya 


= C a 


data U 


= U 


data Var a 


= Var a 


data Rec a 


= Rec a 



As usual, we use sums to encode alternatives (different construc- 
tors), and products to encode multiple arguments to a construc- 
tor. Constructors are tagged with C, so that we can store meta- 
information such as constructor name, fixity, etc. After tagging with 
C, constructors without any arguments are encoded using the unit 
type U, and every argument to a constructor is further wrapped in a 
Var or Rec tag to indicate if the argument is a parameter of the type 
or a (potentially recursive) occurrence of a datatype. 

To mediate between a value and its generic representation we 
use a type class: 

class Representable a where 
type Rep a 

to : : Rep a -> a 
from ::a — S> Rep a 

A Representable type has an associated Representation type, which 
is constructed using the types shown previously. We use a type 
family ( Schrijvers et al. 2008 1 to encode the isomorphism between 
a type and its representation, together with conversion functions to 
and from. 

As an example, we show the instantiation of the standard list 
datatype: 

data Listn 

instance Constructor List^ where conName _ = " [] " 
data List- 

instance Constructor List-, where 
conName _= " : " 

conFixity _ = Infix RightAssociative 5 

instance Representable [a] where 
type Rep [a} = C List^ U 

+ C List- (Var a x Rec [a]) 

from[] =L(CU) 

from (a : as) = R (C (Var a x Rec as)) 

to (L (CIO) =[] 

to (R (C (Var a x Rec as))) = (a : as) 

The first lines deal with the constructor meta-information. Ev- 
ery constructor of the original datatype gives rise to an empty 
datatype, used as the first argument to C in the representation. The 
instances of the Constructor class provide the meta-information. 
Lists have two constructors: the empty case corresponds to a Left 



injection, and the cons case to a Right injection. For the cons case 
we have two parameters, encoded using a product. The first one is 
a Variable, and the second a Recursive occurrence of the list type. 

2.2 Generic functions 

Generic functions are defined by giving an instance for each repre- 
sentation type. We show two examples of generic functions: equal- 
ity and enumeration. 

2.2.1 Equality 

We start the definition of generic equality with a type class and 
instances for the sum and product cases: 

class GEq' a where 
geq' :: a — > a — > Bool 

instance (GEq' a, GEq' J3) => GEq' (a + p) where 
geq' (L a) (L b) = geq' a b 
geq' (R a) (R b) = geq' a b 
geq' _ _ = False 

instance (GEq' a, GEq' /3) =>■ GEq' (a x /3) where 

geq 1 (a x b) (a' x b') = geq 1 a a' A geq 1 b b' 

The instance for sums checks that both arguments are injected on 
the same side, and proceeds recursively. Products proceed recur- 
sively, requiring both arguments to be equal. 

t/nits are trivially equal, and Constructors are equal if their 
arguments are equal: 

instance GEq' U where 
geq' U U = True 

instance (GEq' a) GEq' (Cya) where 
geq 1 (C a) (C b) = geq' a b 

Finally, variables and recursive occurrences simply call another 
type-class for equality: 

instance (GEq a) => GEq 1 (Var a) where 
geq' (Var a) (Var b) = geq a b 

instance (GEq a) => GEq 1 (Rec a) where 
geq' (Rec a) (Rec b) = geq a b 

This new type class GEq is used to aggregate all types we can 
compare for equality (be it generically or not): 

class GEq a where 

geq:: a — > a — » Boo! 

We can give ad-hoc instances for base types: 

instance GEq Char where 
geq = (=) 

But what we mostly want is to be able to use the generic instances 
of the GEq' class. For this we need a default implementation, which 
defines how to apply generic equality to a representable type: 

geqDefault :: (Representable a, GEq 1 (Rep a)) 

=> a — > a -> Bool 
geqDefault xy = geq' (from x) (from y) 

If a type is representable and its representation has an instance of 
GEq', we can compute generic equality by first converting from the 
original datatype and then calling geq' on the generic representa- 
tion. All we are missing is a GEq instance for lists, now trivial: 

instance (GEq a) => GEq [a] where 

geq = geqDefault 
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2.2.2 Enumeration 



2.3 Similarities to other libraries 



Often generic functions are divided into three groups: consumers, 
transformers, and producers. Generic consumers, like equality, take 
a representable value and consume it into some constant type. 
Generic transformers, like fmap, take a representable type, trans- 
form it in some way, but return an element of the same type. 
Generic producers, like read, take some constant type and pro- 
duce a representable type out of it. This distinction is important 
because often functions of the same group are defined in a simi- 
lar style. Also, generic producers tend to be more problematic than 
consumers or transformers ( Rodriguez Yakushev et al.|2008> . 

As an example of a generic producer, we show generic enumer- 
ation: 

class GEnum' a where 
genum' ::[cc] 

instance GEnum' U where 
genum' = [ U] 

instance (GEnum a) => GEnum' (Rec a) where 
genum' = map Rec genum 

instance (GEnum a) => GEnum' (Var a) where 
genum' = map Var genum 

instance (GEnum' a) => GEnum' (C J a) where 
genum' = map C genum! 

We represent enumerations simply as lists of elements. There is 
only one unit, and the recursive and variable cases rely on another 
type-class, just as in the equality function. Constructors are gener- 
ated by recursive invocation. 

The most interesting cases are for sums, where we have a choice 
of what to generate, and products, where we have to combine all 
possible generated values: 

instance (GEnum' a, GEnum' j5) => GEnum' (a + p) where 
genum' = map L genum! +f map R genum! 

instance (GEnum' a, GEnum' j5) => GEnum' (a x |3) where 
genum' = [x X y | x <— genum! , y <— genum' ] 

It is better to replace (+f) with an operator that alternates the 
elements of the lists, and the list product with a diagonalization, 
thereby guaranteeing that every element of the datatype will even- 
tually be generated. For the purposes of this paper those details are 
unimportant, so we take a simplistic implementation. 

The default generic implementation simply maps the conversion 
function over the list of generated elements: 

genumDefault :: (Representable a, GEnum' (Rep a)) [a] 
genumDefault = map to genum' 

We now define the top-level class GEnum and give ad-hoc 
instances for hit (simplified) and Bool, and a generic instance for 
lists: 

class GEnum a where 
genum : : [ a ] 

instance GEnum Int where 

genum = [0. .5] 
instance GEnum Bool where 

genum = [True, False] 

instance GEnum a => GEnum [a] where 
genum = genumDefault 

Finally, we can generate lists of integers, for instance: take 5 (genum : : 
[[hit]}) evaluates to [[], [0], [0,0], [0,0,0], [0,0,0,0]]. 



We chose to present instant -generics only for its simplicity; 
we know of at least three other libraries which use type families 
and type classes in a very similar way: regular (Van Noort et 
al. |2008f , which further allows for mapping over container types; 
multirec (Rodriguez Yakushev et al. 2009), which further allows 
catamorphisms over mutually-recursive families of datatypes; and 
generic-deriving (Magalhaes et al. 2010), which merges in- 
stant-generics and regular due to be implemented in the 
Glasgow Haskell Compiler (GHC). These libraries work in a sim- 
ilar way to instant -generics , and the modification we describe 
in the next sections applies equally well to all of them. 

3. Indexed datatypes 

While the libraries described in the previous section already allow 
a wide range of datatypes to be handled in a generic fashion, they 
cannot deal with indexed datatypes. We call a datatype indexed if it 
has a type parameter that is not used as data (also called a phantom 
type parameter), but at least one of the datatype's constructors 
introduces type-level constraints on this type. The type of vectors, 
or size-constrained lists, is an example of such a datatype: 

data Vec a v where 

Nil :: VecaZe 
Cons:: a — >• Vec a v — > Vec a (Su v) 

The first parameter of Vec, a, is the type of the elements of the 
vector. In the GADT syntax above with type signatures for each 
constructor, we see that a appears as an argument to the Cons 
constructor; a is a regular type parameter. On the other hand, the 
second parameter of Vec, v, does not appear as a direct argument 
to any constructor: it is only used to constrain the possible ways 
of building Vecs. We always instantiate v with the following empty 
(uninhabited) datatypes: 

data Ze 
data Su v 
type 0 T = Ze 
type It = Su 0j 
type 2j = Su It 

A vector with two Chars, for instance, is represented as follows: 

exampleVec :: Vec Char 2j 

exampleVec = Cons 'p' (Cons 'q' Nil) 

Note that its type, Vec Char 2j, adequately encodes the length 
of the vector; giving any other type to exampleVec, such as 
Vec Char 0j or Vec Char Char, would result in a type error. 

Indexed types are easy to define as a GADT, and allow us to 
give more specific types to our functions. For instance, the type of 
vectors above allows us to avoid the usual empty list error when 
taking the first element of an empty list, since we can define a head 
function that does not accept empty vectors: 

head Vec :: Vec a (Su v) -> a 
headvec (Cons x _) = x 

GHC correctly recognizes that it is not necessary to specify a case 
for headvec Nil, since that is guaranteed by the type-checker never 
to happen. 

Indexed datatypes are also useful when specifying well-typed 
embedded languages: 

data Term a where 

Lit ::Int — > Term Int 

IsZero : : Term Int — > Term Bool 
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Pair :: Term a —> Term /3 — > Term (a,P) 
If : : Term Bool — > Term a — > Term a — j> Term a 

The constructors of Term specify the types of the arguments they 
require and the type of term they build. We will use the datatypes 
Vec and Term as representative examples of indexed datatypes in 
the rest of this paper. 

3.1 Type-level equalities and existential quantification 

Indexed datatypes such as Vec and Term can be defined using 
only existential quantification and type-level equalities ( Joh ann and| 
|Ghani|2008| Theorem 3). For example, GHC rewrites Vec to the 
following equivalent datatype: 

data Vec a v = v~Ze =>■ Nil 

| V/x.v~5m jU => Cons a (Vec a jj.) 

The constructor Mi introduces the constraint that the type variable 
v equals Ze; a~/3 is GHC's notation for type-level equality be- 
tween types a and j3. The Cons constructor requires v to be a 
5k of something; this "something" is encoded by introducing an 
existentially-quantified variable jl, which stands for the length of 
the sublist, and restricting v to be the sucessor of jl (in other words, 
one plus the length of the sublist). 

This encoding of Vec is entirely equivalent to the one shown 
previously. While it may seem more complicated, it makes explicit 
what happens "behind the scenes" when using the Mi and Cons 
constructors: Nil can only be used if v can be unified with Ze, 
and Cons introduces a new type variable fl, constrained to be 
the predecessor of v. In the next section we will look at how to 
encode indexed datatypes generically; for this we need to know 
what kind of primitive operations we need to support. Looking at 
this definition of Vec, it is clear that we will need not only a way 
of encoding type equality constraints, but also introduction of new 
type variables. 

3.2 Functions on indexed datatypes 

The extra type safety gained by using indexed datatypes comes at 
a price: defining functions operating on these types can be harder. 
Consumer functions are not affected; we can easily define an eval- 
uator for Term, for instance: 

evaT.: Term a — > a 

eval (Lit i) — i 

eval (IsZero t) = eval t = 0 

eval (Pair ab) = (eval a, eval b) 

eval (If p ab) = if eval p then eval a else eval b 

In fact, even GHC can automatically derive consumer functions 
on indexed datatype for us; the following Show instances work as 
expected: 

deriving instance Show a => Show (Vec a v) 
deriving instance Show (Term a) 

Things get more complicated when we look at producer func- 
tions. Let us try to define a function to enumerate values. For lists 
this is simple: 

enum^ :: [a] -¥ [[a]] 

enum^ ea = [] : [x:xs \ x 4— ea,xs enumn ea] 

Given an enumeration of all possible element values, we generate 
all possible lists, starting with the empty listQ However, a similar 
version for Vec is rejected by the compiler: 



' Once again we are not diagonalizing the elements from ea and enumn ea, 
but that is an orthogonal issue. 



enumvec [ a ] ~> [ Vec a v] 

enumy ec ea = Nil: [Cons x xs \ x <— ea,xs enumy ec ea] 

GHC complains of being unable to match Ze with Su v, and right- 
fully so: we try to add Mi, of type Vec a Ze, to a list containing 
Comes, of type Vec a (Su v). To make this work we can use type 
classes: 

instance GEnum (Vec a Ze) where 
genum = [] 

instance ( GEnum a, GEnum (Vec a v)) 
GEnum (Vec a (Su v)) where 
genum = [Cons a t \ a <— genum, t genum] 

In this way we can provide different types (and implementations) 
to the enumeration of empty and non-empty vectors. 

Note that GHC (version 7.0.3) is not prepared to derive pro- 
ducer code for indexed datatypes. Trying to derive an instance 
Read (Vec a v) results in the generation of type-incorrect code. We 
show in |Section 5.2| a way of identifying the necessary instances to 
be defined, which could also be used to fix this issue. 

4. Handling indexing generically 

As we have seen in the previous section, to handle indexed data- 
types generically we need support for type equalities and quantifi- 
cation in the generic representation. We deal with the former in 
|Section CT) and the latter in |Section 4.2| 

4.1 Type equalities 

A general type equality a~/3 can be encoded in a simple GADT: 

data a ~ p where 

Reft:; a~a 

We could add the ~ type to the representation types of |in-| 
stant-generics and add type equalities as extra arguments to 
constructors. However, since the equalities are always introduced 
at the constructor level, and we have a representation type to encode 
constructors, we prefer to define a more general representation type 
for constructors which also introduces a type equality: 

data C Eq Y<t> ¥ a where 
C £f/ ::a-> C £f/ y0 0 a 

The new Ceq type takes two extra parameters which are forced 
to unify by the C Eq constructor. The old behavior of C can be 
recovered by instantiating the (p and I// parameters to trivially equal 
types: 

type Cya = C Eq y () () a 

Note that we can encode multiple equalities as a product of 
equalities. For example, a constructor which introduces the equality 
constraints a^Int and fi^Char would be encoded with a represen- 
tation of type C Eq y (a x ft) (Int x Char) 8 (for suitable y and 
S). 

4.1.1 Encoding types with equality constraints 

At this stage we are ready to encode types with equality constraints 
that do not rely on existential quantification; the ~ type shown 
before is a good example: 

instance Representable (a ~ j8) where 

type Rep(a~fi) = C Eq ~ Refl a ft U 

from Refl = C Eq U 
to (C Eq U) = Refl 

The type equality introduced by the Refl constructor maps directly 
to the equality introduced by C Eq , and vice-versa. As Refl has 
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no arguments, we encode it with the unit representation type U. 
The auxiliary datatype —Refl, which we omit, is used to encode 
constructor information about Refl, as usual in instant -gener- 

licsl 

4.1.2 Generic functions over equality constraints 

We need to provide instances for the new Ceo representation type 
for each generic function. The instances for the equality and enu- 
meration functions of lSection 2.2l are: 

instance (GEc( a) => GEq 1 (C Eq yt> Y a ) where 

geq' (C Eq a) (C Eq b) = geq 1 a b 

instance (GEnurri a) => GEnum! {CEq 70 <t> «) where 
genum' = map C Eq genum' 

instance GEnum' (CE q 7 <t> y a) where 

genum 1 = [] 

Generic consumers, such as equality, are generally not affected by 
the equality constraints. We do not bother requiring that (f> and 
iff are equal because there is no way to build a value which does 
not obey that restriction. Generic producers are somewhat trickier, 
because we are now trying to build a generic representation, and 
thus must take care not to build impossible cases. For generic 
enumeration, we proceed normally in case the types unify, and 
return the empty enumeration in case the types are different. Note 
that these two instances overlap, but remain decidable. 

4.2 Existentially-quantified indices 

Recall the shape of the Cons constructor of the Vec datatype: 

V/i.v~5t< /i Cons a (Vec a n) 

We need to be able to introduce new type variables in the type 
representation. A first idea would be something like: 

type Rep (Vec a v) = V/i.C £? Vec Cons v (Su /x) ... 

This however is not accepted by GHC, as the right-hand side of a 
type family instance cannot contain quantifiers. This restriction is 
well justified, as allowing this would lead to higher-order unifica- 
tion problems. 

Another attempt would be to encode representations as data 
families instead of type families, so that we can use regular exis- 
tential quantification: 

data instance Rep (Vec a v) = 

y^l.Rep Vec (C Eq Vec Com v (Su /i) ...) 

However, we do not want to use data families to encode the generic 
representation, as these introduce a new constructor per datatype, 
thereby effectively precluding a generic treatment of all types. 

4.2.1 Faking existentials 

Since the conventional approaches do not work, we turn to some 
more unconventional approaches. All we have is an index type vari- 
able v, and we need to generate existentially-quantified variables 
that are constrained by v. We know that we can use type families to 
create new types from existing types, so let us try that. We introduce 
a type family 

type family X v 

and we will use X v where the original type uses jl. We can now 
write a generic representation for Vec: 

instance Representable (Vec a v) where 
type Rep (Vec av) = CEq Vec m v Ze U 

+ C E q Vec cons V (Su (X v)) 

(Var a x Rec (Vec a (X v))) 



from Nil =L(C Eq U) 

from (Cons ht)=R (C Eq (Var h x Rec t)) 

to(L(C Eq U)) = Nil 

to (R (C E q (Varhx Rec t))) = Cons h t 

This is a good start, but we are not done yet, as GHC refuses to 
accept the code above with the following error: 

Could not deduce (m ~ X (Su m) ) 
from the context (n ~ Su m) 
bound by a pattern with constructor 
Cons : : f orall an. a -> Vec a n 

-> Vec a (Su n) , 
in an equation for 'from' 

What does this mean? GHC is trying to unify /.( with X (Su fl), 
when it only knows that v^Su [l. The equality v^Su fl comes 
from the pattern match on Cons, but why is it trying to unify fl 
with X (Su /i)? Well, on the right-hand side we use C Eq with type 
CEq Vec c ot is V (Su (X v)) . . ., so GHC tries to prove the equality 
V^Su (X v). In trying to do so, it replaces v by Su fl, which leaves 
Su fl^Su (X (Su jJ,)), which is implied by /l~X (Su but GHC 
cannot find a proof of the latter equality. 

This is unsurprising, since indeed there is no such proof. Fortu- 
nately we can supply it by giving an appropriate type instance: 

type instance X (Su p.) = ji 

We call instances such as the one above "mobility rules", as they 
allow the index to "move" through indexing type constructors 
(such as Su) and X. Adding the type instance above makes the 
Representable instance for Vec compile correctly. Note also how X 
behaves much like an extraction function, getting the parameter of 
Su. 



Representation for Term. The Term datatype (shown in |Sec-| 
|tion 3} can be represented generically using the same technique. 
First let us write Term with explicit quantification and type equali- 
ties: 

data Term a = 

a~Int => Lit Int 
a^Bool => IsZero (Term Int) 
| VJ3 y.a~(j8, y) => Pair (Term j8) (Term y) 

If (Term Bool) (Term a) (Term a) 

We see that the Lit and IsZero constructors introduce type equal- 
ities, and the Pair constructor abstracts from two variables. This 
means we need two type families: 

type family Xj a 
type family X2 a 

Since this strategy could require introducing potentially many type 
families, we use a single type family instead, parametrized over two 
other arguments: 

type family Xyi a 

We instantiate the 7 parameter to the constructor representation 
type, ( to a type-level natural indicating the index of the introduced 
variable, and a to the datatype index itself. 
The representation for Term becomes: 

type Rep Term a = 

CEq Termiit Ct Int (Rec Int) 
+ CEq TermisZero 01 Bool (Rec (Term Int)) 
+ C Eq Termpair a (X Term Pair 0 T a,X Term Pair 1 T a) 
( Rec (Term (X Termp a j r Oj a)) 
x Rec (Term (X Termp alr lj a))) 
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+ C Termjf 

(Rec (Term Bool) x Rec (Term a) x Rec (Term a)) 

We show only the representation type Repj em , as the from and to 
functions are trivial. The mobility rules are induced by the equality 
constraint of the Pair constructor: 

type instance X Termp a i r Oj (j8, 7) = P 
type instance X Term Pair 1 T 7) = 7 

Again, the rules resemble selection functions, extracting the first 
and second components of the pair. 

Summarising, quantified variables are represented as type fam- 
ilies, and type equalities are encoded directly in the new Ceq rep- 
resentation type. Type equalities on quantified variables need mo- 
bility rules, represented by type instances. We have seen this based 
on two example datatypes; in |Section"6] we describe more formally 
how to encode indexed datatypes in the general case. 

5. Instantiating generic functions 

Now that we know how to represent indexed datatypes, we pro- 
ceed to instantiate generic functions on these types. We split the 
discussion into generic consumers and producers, as they require a 
different approach. 

5.1 Generic consumers 

Instantiating generic equality to the Vec and Term types is unsur- 
prising: 

instance (GEq a) =>■ GEq (Vec a v) where 
geq = geqDefault 

instance GEq (Term a) where 

geq = geqDefault 

Using the instance for generic equality on Cecj of |Section 4.1.2 
these instances compile and work fine. The instantiation ot generic 
consumers to indexed datatypes is therefore no more complex than 
to standard datatypes. 

5.2 Generic producers 

Instantiating generic producers is more challenging, as we have 
seen in |Section 3.2| For Vec, a first attempt could be: 

instance (GEnum a) => GEnum (Vec a v) where 
genum = genumDefault 

However, this will always return the empty list: we do not know 
what v is, so we cannot assume it to be Ze, Su Ze, or anything else. 
It could even be something nonsensical such as hit, so the only 
possible thing to return is the empty list. Instead, as before, we give 
two instances, one for Vec a Ze, and another for Vec a (Su v), given 
an instance for Vec a V: 

instance (GEnum a) => GEnum (Vec a Ze) where 
genum = genumDefault 

instance ( GEnum a, GEnum (Vec a v)) 
GEnum (Vec a (Su v)) where 
genum = genumDefault 

We can check that this works as expected by enumerating all the 
vectors of Booleans of length one: genum:: [Vec Bool (SuZe)] eval- 
uates to [ Cons True Nil, Cons False Nil], the two possible combina- 
tions. 

Instantiating Term. Instantiating GEnum for the Term datatype 
follows a similar strategy. We must identify the types that Term is 
indexed on. These are Int, Bool, and (a,j3), in the Lit, IsZero, and 
Pair constructors, respectively. The If constructor does not impose 



any constraints on the index, and as such can be ignored for this 
purpose. Having identified the possible types for the index, we give 
an instance for each of these cases: 

instance GEnum (Term Int) where 
genum = genumDefault 

instance GEnum (Term Bool) where 
genum = genumDefault 

instance (GEnum (Term a), GEnum (Term /3)) 
=> GEnum (Term (a,fi)) where 
genum = genumDefault 

We can now enumerate arbitrary Terms: 

genum :: [Term Int] ! ! 5 ^* Pair (Lit 0) (IsZero (Lit 5)) 

However, having to write the three instances above manually is 
still a repetitive and error-prone task; while the method is trivial 
(simply calling genumDefault), the instance head and context still 
have to be given, but these are determined entirely by the shape 
of the datatype. We have written Template Haskell iShe ard and| 
|Peyton Jones|2002] > code to automatically generate these instances 
for the user. In this section and the previous we have seen how 
to encode and instantiate generic functions for indexed datatypes. 
In the next section we look at how we automate this process, by 
analyzing representation and instantiation in the general case. 

6. General representation and instantiation 

In general, an indexed datatype has the following shape: 
data Da = V/37.y7 ^C,J, 

I VPn.yn^Cnljh, 

We consider a datatype D with arguments a (which may or may not 
be indices), and n constructors C/ . . . C„, with each C, constructor 
potentially introducing existentially-quantified variables /3;, type 
equalities yj, and a list of arguments </>,■. We use an overline to denote 
sequences of elements. 

We need to impose some further restrictions to the types we are 
able to handle: 

1. Quantified variables are not allowed to appear as standalone 
arguments to the constructor: V ; ^ g -p .fi ^ (f>j. 

2. Indices are not allowed to appear as standalone arguments to a 
constructor: e ^.islndex a D — > V;.a (j),. We define islndex 
in lSection 6.2l 

3. Quantified variables have to appear in the equality constraints: 
V ; 0 g w.3\ff.\if (857 We require this to provide the mobility 
rules; in |SectionTl we discuss how this restriction can be lifted. 

For such a datatype, we need to generate two types of code: 

1 . The generic representation 

2. The instances for generic instantiation 

We deal with (1) in |Section 6.1| and (2) in |Section 6.2| 

6.1 Generic representation 

Most of the code for generating the representation is not specific to 
indexed datatypes; see, for instance, [Magalhaes et al. (2010| for a 
formalization of a similar representation. What needs to be adapted 
is the code generation for constructors, since now Cj7q takes two 
extra type arguments. The value generation (functions from and to) 
is not affected, only the representation type. 
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Type equalities. For each constructor C;, an equality constraint 
T;~T2 EYn becomes the second and third arguments to Cg 9 , for 
instance CEq . . . T; %2 ... Multiple constraints like T;~T2, Tj~t^ be- 
come a product, as in . . . (t; x Tj) (t? x t 4 ) An existentially- 

quantified variable j3, appearing on the right-hand side of a con- 
straint of the form T~ ... or on the arguments to C; is replaced by 
X y 1 T, with 7 the constructor representation type of C,-, and l a 
type-level natural version of i. 

Mobility rules. For the generated typ e families we provide the 



necessary mobility rules i Section 4.2.1 >. Given a constraint Vj3„.y„, 
each equality /3~t// T, where j3 6 j3 and l/A t is some type expression 
containing T, we generate a type instance X y l (\jf t) = T, where 
y is the constructor representation type of the constructor where 
the constraint appears, and l is a type-level natural encoding the 
index of the constraint. As an example, for the Cons constructor of 
|Section 3TT] j3 is v, % is p., y is Su, y is VecQ 0m , and i is Oj. 

6.2 Generic instantiation 

To give a more thorough account of the algorithm for generation of 
instances, we will sketch its implementation in Haskell. We assume 
the following representation of datatypes: 

data Datatype = Datatype [TyVar] [Con] 
data Con = Con Constraints [Type] 

data Type — abstract 
data TyVar — abstract 

tyVarType :: TyVar — > Type 

A datatype has a list of type variables as arguments, and a list 
of constructors. Constructors consist of constraints and a list of 
arguments. For our purposes, the particular representation of types 
and type variables is not important, but we need a way to convert 
type variables into types (tyVarType). 

Constraints are a list of (existentially-quantified) type variables 
and a list of type equalities: 

data Constraints = [TyVar] > [TyEq] 
data TyEq = Type :~: Type 

We will need equality on type equalities, so we assume some 
standard equality on types and type equalities. 

With this representation of datatypes we are ready to start the 
description of the algorithm for encoding indexed datatypes. We 
start by separating the datatype arguments into "normal" arguments 
and indices: 

findlndices :: Datatype — S- ([TyVar + TyVar]) 
findlndices (Datatype vs cs) = 

[if v 'inArgs' cs then L v else R v [ v <— vs] 

inArgs:: TyVar — ► [Con] — > Bool 
inArgs = ... 

We leave inArgs abstract, but its definition is straightforward: it 
checks if the argument TyVar appears in any of the constructors as 
an argument. In this way, findlndices tags normal arguments with 
L and potential indices with R. These are potential indices because 
they could also just be phantom types, which are not only not used 
as argument but also have no equality constraints. In any case, it 
is safe to treat them as indices. The islndex function used before is 
defined in terms of findlndices: 

islndex : : TyVar — > Datatype — > Bool 
islndex t d = R t 6 findlndices d 

Having identified the indices, we want to identify all the return 
types of the constructors, as these correspond to the heads of the 
instances we need to generate. This is the task of function findRTs: 



findRTs:: [TyVar] — > [Con] — > [Constraints] 

findRTs is [] = [] 

findRTs is ((Con cts args) : cs) = let rs = findRTs is cs 

in if any In is cts 
then cts : rs 
else rs 

any In:: [TyVar] — > Constraints — > Bool 

anyln vs (_ > teqs) = or [v 'injyEq' teqs \ v <— vs] 

injyEq TyVar — $ [TyEq] — > Bool 

i"TyEq = • • • 

We check the constraints in each constructor for the presence of 
a type equality of the form i :~: t, for some index type variable i 
and some type /. We rely on the fact that GADTs are converted 
to type equalities of this shape; otherwise we should look for the 
symmetric equality t :~: ( too. 

Having collected the important constraints from the construc- 
tors, we want to merge those with the same return type. Given the 
presence of quantified variables, this is not a simple equality test; 
we consider two constraints to be equal modulo all possible instan- 
tiations of the quantified variables: 

instance Eq Constraints where 

(vs > cs) = (ws > ds) = length vs = length ws 
A cs = subst ws vs ds 

subst:: [TyVar] [TyVar] -> [TyEq] -> [TyEq] 
subst vs ws teqs = . . . 

Two constraints are equal if they abstract over the same number of 
variables and their type equalities are the same, when the quantified 
variables of one of the constraints are replaced by the quantified 
variables of the other constraint. This replacement is performed by 
subst; we do not show its code since it is trivial (given a suitable 
definition of Type). 

Merging constraints relies on constraint equality. Each con- 
straint is compared to every element in an already merged list of 
constraints, and merged if it is equal: 

merge :: Constraints — > [Constraints] — > [Constraints] 

merge cl [] = [cl] 

merge cl@(vs> cs) (c2 @ (ws > ds) : ess) 

| cl = c2 = (vs > (cs +f subst ws vs ds) ) : ess 

| otherwise = c2 : merge cl ess 

mergeConstraints :: [Constraints] — > [Constraints] 
mergeConstraints =foldr merge [] 

We can now combine the functions above to collect all the 
merged constraints: 

rightOnly:: [a + /3] [j8] 
rightOnly]] = [] 

rightOnly ((Ra):t) = a: rightOnly t 
rightOnly (_ : t) = rightOnly t 

allConstraints :; Datatype — > [Constraints] 
allConstraints d@ (Datatype _ cons) = 

let is = rightOnly (findlndices d) 

in mergeConstraints (findRTs is cons) 

We know these constraints are of shape i :~: t, where i is an index 
and / is some type. We need to generate instance heads of the form 
instance G (Da), where a e buildlnsts D. The function buildlnsts 
computes a list of type variable instantiations starting with the list 
of datatype arguments, and instantiating them as dictated by the 
collected constraints: 

buildlnsts : : Datatype — > [ [ Type ] ] 

buildlnsts d@ (Datatype ts _) = map (instVar ts) cs 
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where cs = concat {map (A(_ >t)—¥t) {allConstraints d)) 

instVar:: [TyVar] — > TyEq — > [Type] 

instVar [} _ = [ ] 

instVar (v : vs) tEq@ (i :~: t) 

| ty VarType v = i = t: map tyVarType vs 

| otherwise = tyVarType v : instVar vs tEq 

This completes our algorithm for generic instantiation to indexed 
datatypes. As mentioned before, the same analysis could be used 
to find out what Read instances are necessary for a given indexed 
datatype. Note however that this algorithm only finds the instance 
head for an instance; the method definition is trivial when it is a 
generic function. 

7. Unrestricted indices 

In the previous section we have seen how to handle datatypes 
with indices generically. We imposed three restrictions on the da- 
tatypes we handle. Restrictions (1) and (2) deal with existentially- 
quantified arguments; it is not the aim of this paper to add generic 
support for existentially-quantified datatypes. We discuss this issue 
in more detail in lSection 9.11 

Restriction (3), on the other hand, seems a bit more arbitrary. A 
contrived example of a datatype which does not pass this restriction 
is the following: 

data Tag a where 

Tagi :: Tag Int 

Tags Tag a — > Tag Bool 

Let us first write Tag with explicit quantification and constraints: 

data Tag a = a~Int =► Tagi 

| vp.a~Bool => Tag B {Tag J3) 

The constructor Tagi simply sets the index a to Int, The constructor 
Tags sets the index to Bool, and also takes an argument tagged with 
any tag /3. Valid values of type Tag are sequences of Tags's ending 
with a Tag j. 

Although this use of indices appears more simple than others 
we have shown before, the approach presented so far cannot handle 
these datatypes. Our first take would be to represent Tag as follows: 

data Tag TagI 
data Tag TagB 

instance Constructor Tagj agl where conName _ = "Tagi" 
instance Constructor Tagj a( , B where conName _ = "TagB" 

instance Representable {Tag a) where 
type Rep {Tag a) = 

C Eq Tag Tagl a Int U 
+ C Eq TagTage a B °° l ( Rec ( Ta 8 ( X Ta gTag B °t «))) 
from Tagi = L{C Eq U) 
from {Tag B c) = R {C Eq {Rec c)) 

to {L {C Eq U)) = Tagi 
to{R{C Eq {Recc))) = Tag B c 

But this code does not type-check: the second equation of from 
introduces the constraint fi^X Tagj a g B Or Bool which we cannot 
prove. Since Tags does not put any constraints on j3, we do not have 
any mobility rule, so we have no type instance for X Tagj agB Oj a. 
The problem is that we have to come up with a particular instan- 
tiation for X Tagj agB Oj a, while the constructor Tagg accepts any 
index. 

However, we can see from the declaration of Tag that the only 
possible instantiations for jS are Int and Bool. In fact, we have al- 
ready shown how to automatically determine the possible instan- 



tiations of an index variable: this is the task of function buildlnsts 
of jSection 6.2| Therefore we argue the following equality should 
hold: 

type instance X TagTag B Oj <X = Int + Bool 

However, this does not change the situation much: the second 
equation of from now requires the type equality jS~/«f + Bool, 
which we informally know is true, but cannot convince the type 
checker of. 

7.1 Digression: proper kinds 

In fact, all this trouble could be avoided if we had user-defined 
kinds. We are using type families and GADTs to perform type- 
level computations and make our programs very strongly typed. 
This means we make the structure of possible values well-defined 
and cleanly separated, but unfortunately the structure of the types 
themselves is not so organized: all types belong to a single kind . 
This is clearly not what we want. Two examples follow: 

• In |Section 2.2| we say we define type classes to provide in- 
stances for the generic representation types. However, nothing 
prevents us from giving GEq' instances for any other types. 
Also, the compiler does not check that we have given instances 
for all the representation types; if we forget one instance, we 
only get an error (and a slightly obscure one) when instantiat- 
ing the function to a particular type. 

• When defining an indexed datatype, like Vec, we define its kind 
as being — > — > , while we hope the second parameter to 
only be instantiated with the types Ze or Su. 

What we really need is the ability to create our own kinds and then 
restrict type arguments based on their kinds. Our representation 
types, for instance, should be grouped in a separate kind. We il- 
lustrate this with a hypothetical new language feature: 



kind 


where 




U : 




*R 


+ : 


: -kR -> *« 




x : 


: *r — * *r 


-¥ *R 


C Eq : 


: *c — » * — ► * - 


+ *R^-*R 


Var: 


: 


~^*R 


Rec: 


: 





We would also have a kind containing only the generated data- 
types for the constructor information. Furthermore, we can say that 
Var and Rec expect datatypes of kind as their arguments. 

A type class for defining a generic function would then explic- 
itly state the kind of its argument: 

class GEq' {a :: ■*•/?) where . . . 

The type-checker could then check that all necessary instances 
were given. 

Indexed datatypes could give proper kinds to their indices: 

kind * ? where 

Int : : *j 
Bool 

data Tag:: * j — > * where 

Tagi Tag Int 

Tags :: VjS :: * j.Tag /3 — > Tag Bool 

Note how Tags now constrains its argument to kind , effectively 
stating that it can only be Int or Bool. We would then hopefully be 
able to handle Tag generically by using kind genericity (albeit in 
a different style from Hinze (2002), as our kind structure is now 
richer). 
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However, while this style of programming is common practice 
in fimega (Sheard 2006), or in dependently-typed languages such 
as Agda (Norell 2007), it remains unclear how to integrate such 
features in Haskell, in particular regarding type and kind checking, 
inference, and interaction with all the other language features. 

7.2 Unsafe coercions 

Lacking better mechanisms for type-safe programming, we are 
left with telling the compiler it should just accept our code and 
stop complaining. In the second equation of from we have col- 
ored c in orange: this means that in the code we actually write 
unsafeCoerce c, where unsafeCoerce :: a — > /3 is a Haskell prim- 
itive to cast between values that are known to be equivalent. 

Using unsafeCoerce means effectively abandoning the safety 
of Haskell; in theory we are now prone to runtime errors caused 
by using arguments of the wrong type. However, we are certain 
that our cast will not cause any problems because the variable 
is never used for purposes other than type-checking. In fact, as 
soon as we introduce the cast, the specific right-hand side of the 
type instance X Tagr a g B 0j a is not important; the instance might 
even be absent altogether. The consequence is that the generic rep- 
resentation type Rep (Tag a) is not isomorphic to Tag a; ignor- 
ing undefined values, Tag a might contain more inhabitants than 
Rep (Tag a), depending on the instance for X Tagj a g B 0j (X. 

Consider the possible instantiation type instance X Tagj agB 0j a = 
Int. In this case, it is clear that Tag a contains more inhab- 
itants than its representation, as the value Tags {Tags Tagj), 
for instance, cannot be represented. Its representation would be 
R (Ce„ (Rec (Tags Tagj))), but this does not type-check because 
we cannot prove that Tag Int^Tag Bool. However, we know we 
only have values of the generic representation types in two cases: 

1. Converted with from from a user datatype value; 

2. Produced from a generic producer such as genum. 

Case (1) is handled by the unsafe cast: values of type Tag are, 
by definition, type-correct and can be converted to the generic 
representation because we cast them. Case (2) is more problematic: 
in the concrete case of genum, it means we might not be generating 
all possible values. This can happen because we first generate 
values of the generic representation type and then convert these 
with to. If the representation type is "too small", we might not 
generate all possible cases. 

7.3 Generic instantiation 

We can verify this easily by instantiating GEnum to Tag: 

instance GEnum (Tag Int) where 
genum = genumDefault 

instance GEnum (Tag Bool) where 
genum = genumDefault 

At this stage, the type-checker complains we lack an instance for 
GEnum (Tag (X Tagj agB 0j a)). This is to be expected, as we use 
Rec (Tag (X Tagj al , B 0j a)) in the generic representation. We add 
such an instance: 

instance GEnum (Tag (Int + Bool)) where 
genum = genumDefault 

Note that we cannot use a type family in the instance head, so 
we replace X Tagj agB 0j a by its right-hand side Int + Bool. 
This code type-checks correctly, but genum:: [Tag Bool] returns 
[]. This should not be really surprising: the implementation of 
genumDefault hits an instance GEnum' (Ceq Tagj agl Int (Int + 
Bool) a) and an instance GEnum' (CEq Tagj agB Bool (Int + Bool) a); 
both return the empty list. 



What we want is to mimic the behavior of GEnum' for sums in 
our GEnum (Tag (Int + Bool)) instance: 

instance GEnum (Tag (Int + Bool)) where 
genum = gi 4+ gb where 
gi = genum:: [Tag Int] 
gb = genum : : [ Tag Bool] 

The enumeration now works as expected, with take 2 (genum : : 
[Tag Bool]) returning [Tags Tag;, Tags (Tags Tagj)]. However, we 
are forced to use an unsafe cast one more time to convince the 
type checker to accept returning two lists of different types. We 
know this is "ok" because Int and Bool are the two only possible 
instantiations of the index of Tag, so enumerating all possible terms 
for both indices and merging is the right implementation. 

Lastly, it is worth mentioning that generic consumers such as 
GEq are not affected by our casts, and their instantiation remains 
trivial: 

instance GEq (Tag a) where 
geq = geqDefault 

7.4 Reflection 

Circumventing the type system by using unsafeCoerce is not some- 
thing to be taken lightly. Although we have explained informally 
why things cannot "go wrong" with our casts, we would still much 
prefer a well-kinded solution along the lines of |Section 7.1| We do 
not consider the techiques described in this section to be part of the 
library, and we do not provide Template Haskell code to automat- 
ically introduce unsafe casts for the user. Instead, with this section 
we aim only at pointing out the feasability of generic programming 
for unrestricted indices, and providing a foundation for more struc- 
tured future work. 

8. Related work 

Indexed datatypes can be seen as a subset of all GADTs, or as 
existentially-quantified datatypes using type-level equalities. |Jo-| 
hann and Ghani ( 2008 ) developed categorical semantics of GADTs, 
including initial algebra semantics. While this allows for a better 
understanding of GADTs from a generic perspective, it does not 
translate directly to an intuitive and easy-to-use generic library. 

Gibbons (2008 ) describes how to view abstract datatypes as 
existentially-quantified, and uses final coalgebra semantics to rea- 
son about such types. Rodriguez Yakushev and Jeuring (2010) de- 
scribe an extension to the spine view ^Hinze et~a l. 2006 ) support- 
ing existential datatypes. Both approaches focus on existentially- 
quantified data, whereas we do not consider this case at all, instead 
focusing on (potentially existentially-quantified) indices. See |Sec-| 
Ition 9,l| for a further discussion on this issue. 

Within dependently-typed programming, indexing is an ordi- 
nary language feature which can be handled generically more easily 
due to the presence of type-level lambdas and explicit type appli- 
cation (e.g. |Chapman et al.|pOTO) ); |Morris1j2007^ ). 

9. Future work 

In this section we discuss two possible extensions to the techniques 
described in the paper. Both extensions further increase the number 
of datatypes that can be handled generically using the library. 

9.1 Existentials as data 

While we can express indexed datatypes as GADTs or existentially- 
quantified datatypes with type-level equalities, the reverse is not 
true in general. Consider the type of dynamic values: 

data Dynamic = Ma.Typeable a Dyn a 
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The constructor Dyn stores a value on which we can use the opera- 
tions of type class Typeable, which is all we know about this value. 
In particular, its type is not visible "outside", since Dynamic has no 
type variables. Another example is the following variation of Term: 

data Term a where 

Const ::a — > Term a 
Pair :: Term a — > Term /3 — > Term (a,j5) 
Fst :: Term (a, ft) Term a 

Snd ::Term (a,p) Term j8 

Here, the type argument a is not only an index but it is also used 
as data, since values of its type appear in the Const constructor. 
Our approach cannot currently deal with such datatypes. We plan 
to investigate if we can build upon the work of Rodriguez Yakushev 
|and Jeu ring (20101 to also support existentials when used as data. 

9.2 Kind-generic programming with user-defined kinds 

As discussed in |Section 7.1| having user-defined kinds would al- 
low for much safer type-level programming and more expressive 
programs. However, it remains to be seen how user-defined kinds 
can be introduced in Haskell, and how they can be used for generic 
programming. This is a promising direction for future research. 

10. Conclusion 

In this paper we have seen how to increase the expressiveness of 
a generic programming library by adding support for indexed da- 
tatypes. We have used the instant -generics library for demon- 
strative purposes, but we believe the technique readily generalizes 
to all other generic programming libraries using type-level generic 
representation and type classes. We have shown how indexing can 
be reduced to type-level equalities and existential quantification. 
The former is easily encoded in the generic representation, and the 
latter can be handled by encoding the restrictions on the quantified 
variables as relations to the datatype index. All together, our work 
brings the convenience and practicality of datatype-generic pro- 
gramming to the world of indexed datatypes, widely used in many 
applications but so far mostly ignored by boilerplate-removing 
strategies. We also hope to have illustrated the need and the po- 
tential advantages of a better kind system for Haskell. 
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